Overview

Dataset statistics

Number of variables15
Number of observations17050
Missing cells57097
Missing cells (%)22.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.0 MiB
Average record size in memory120.0 B

Variable types

Numeric2
Text9
Unsupported2
Categorical2

Alerts

Binding has constant value ""Constant
TCR_name is highly overall correlated with taskHigh correlation
Unnamed: 0 is highly overall correlated with taskHigh correlation
task is highly overall correlated with TCR_name and 1 other fieldsHigh correlation
TRAV has 1415 (8.3%) missing valuesMissing
TRAJ has 1994 (11.7%) missing valuesMissing
TRBV has 1259 (7.4%) missing valuesMissing
TRBJ has 1483 (8.7%) missing valuesMissing
TRAC has 17050 (100.0%) missing valuesMissing
TRBC has 17050 (100.0%) missing valuesMissing
MHC A has 1276 (7.5%) missing valuesMissing
MHC B has 15570 (91.3%) missing valuesMissing
Unnamed: 0 is uniformly distributedUniform
Unnamed: 0 has unique valuesUnique
TCR_name has unique valuesUnique
TRAC is an unsupported type, check if it needs cleaning or further analysisUnsupported
TRBC is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-18 08:20:19.811901
Analysis finished2024-04-18 08:20:21.956461
Duration2.14 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ)

HIGH CORRELATION  UNIFORM  UNIQUE 

Distinct17050
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8524.5
Minimum0
Maximum17049
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size133.3 KiB
2024-04-18T08:20:22.054136image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile852.45
Q14262.25
median8524.5
Q312786.75
95-th percentile16196.55
Maximum17049
Range17049
Interquartile range (IQR)8524.5

Descriptive statistics

Standard deviation4922.0554
Coefficient of variation (CV)0.57740107
Kurtosis-1.2
Mean8524.5
Median Absolute Deviation (MAD)4262.5
Skewness0
Sum1.4534272 × 108
Variance24226629
MonotonicityStrictly increasing
2024-04-18T08:20:22.196154image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1
 
< 0.1%
11371 1
 
< 0.1%
11357 1
 
< 0.1%
11358 1
 
< 0.1%
11359 1
 
< 0.1%
11360 1
 
< 0.1%
11361 1
 
< 0.1%
11362 1
 
< 0.1%
11363 1
 
< 0.1%
11364 1
 
< 0.1%
Other values (17040) 17040
99.9%
ValueCountFrequency (%)
0 1
< 0.1%
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
ValueCountFrequency (%)
17049 1
< 0.1%
17048 1
< 0.1%
17047 1
< 0.1%
17046 1
< 0.1%
17045 1
< 0.1%
17044 1
< 0.1%
17043 1
< 0.1%
17042 1
< 0.1%
17041 1
< 0.1%
17040 1
< 0.1%

TCR_name
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct17050
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20136.679
Minimum1
Maximum56833
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size133.3 KiB
2024-04-18T08:20:22.332799image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile927.45
Q14385.25
median11449.5
Q332285.75
95-th percentile55365.55
Maximum56833
Range56832
Interquartile range (IQR)27900.5

Descriptive statistics

Standard deviation19032.705
Coefficient of variation (CV)0.94517596
Kurtosis-0.87959122
Mean20136.679
Median Absolute Deviation (MAD)9542
Skewness0.76773102
Sum3.4333037 × 108
Variance3.6224384 × 108
MonotonicityNot monotonic
2024-04-18T08:20:22.463495image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4 1
 
< 0.1%
55398 1
 
< 0.1%
55384 1
 
< 0.1%
55385 1
 
< 0.1%
55386 1
 
< 0.1%
55387 1
 
< 0.1%
55388 1
 
< 0.1%
55389 1
 
< 0.1%
55390 1
 
< 0.1%
55391 1
 
< 0.1%
Other values (17040) 17040
99.9%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
13 1
< 0.1%
14 1
< 0.1%
15 1
< 0.1%
ValueCountFrequency (%)
56833 1
< 0.1%
56832 1
< 0.1%
56829 1
< 0.1%
56828 1
< 0.1%
56827 1
< 0.1%
56826 1
< 0.1%
56825 1
< 0.1%
56824 1
< 0.1%
56823 1
< 0.1%
56822 1
< 0.1%

TRAV
Text

MISSING 

Distinct241
Distinct (%)1.5%
Missing1415
Missing (%)8.3%
Memory size133.3 KiB
2024-04-18T08:20:22.636986image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length20
Median length17
Mean length9.4722098
Min length5

Characters and Unicode

Total characters148098
Distinct characters27
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique62 ?
Unique (%)0.4%

Sample

1st rowTRAV38-2/DV8*01
2nd rowTRAV38-1*01
3rd rowTRAV12-2*01
4th rowTRAV12-2*01
5th rowTRAV12-2*01
ValueCountFrequency (%)
trav12-2*01 1018
 
6.5%
trav12-2 908
 
5.8%
trav13-1*01 626
 
4.0%
trav19*01 600
 
3.8%
trav27*01 547
 
3.5%
trav21*01 509
 
3.3%
trav1-2*01 495
 
3.2%
trav29/dv5*01 475
 
3.0%
trav14/dv4*01 466
 
3.0%
trav5*01 404
 
2.6%
Other values (229) 9592
61.3%
2024-04-18T08:20:22.959078image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 21971
14.8%
V 17320
11.7%
T 15628
10.6%
R 15628
10.6%
A 15621
10.5%
0 12541
8.5%
* 11960
8.1%
2 10776
7.3%
- 7597
 
5.1%
3 4367
 
2.9%
Other values (17) 14689
9.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 148098
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 21971
14.8%
V 17320
11.7%
T 15628
10.6%
R 15628
10.6%
A 15621
10.5%
0 12541
8.5%
* 11960
8.1%
2 10776
7.3%
- 7597
 
5.1%
3 4367
 
2.9%
Other values (17) 14689
9.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 148098
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 21971
14.8%
V 17320
11.7%
T 15628
10.6%
R 15628
10.6%
A 15621
10.5%
0 12541
8.5%
* 11960
8.1%
2 10776
7.3%
- 7597
 
5.1%
3 4367
 
2.9%
Other values (17) 14689
9.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 148098
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 21971
14.8%
V 17320
11.7%
T 15628
10.6%
R 15628
10.6%
A 15621
10.5%
0 12541
8.5%
* 11960
8.1%
2 10776
7.3%
- 7597
 
5.1%
3 4367
 
2.9%
Other values (17) 14689
9.9%

TRAJ
Text

MISSING 

Distinct178
Distinct (%)1.2%
Missing1994
Missing (%)11.7%
Memory size133.3 KiB
2024-04-18T08:20:23.313388image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length11
Median length9
Mean length8.2304729
Min length5

Characters and Unicode

Total characters123918
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)0.2%

Sample

1st rowTRAJ40*01
2nd rowTRAJ48*01
3rd rowTRAJ42*01
4th rowTRAJ48*01
5th rowTRAJ42*01
ValueCountFrequency (%)
traj42*01 1142
 
7.6%
traj52*01 474
 
3.1%
traj33*01 436
 
2.9%
traj45*01 432
 
2.9%
traj20*01 396
 
2.6%
traj49*01 376
 
2.5%
traj42 372
 
2.5%
traj37*01 350
 
2.3%
traj30*01 347
 
2.3%
traj50*01 342
 
2.3%
Other values (165) 10389
69.0%
2024-04-18T08:20:23.644559image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
T 15056
12.1%
A 15056
12.1%
R 15056
12.1%
J 15054
12.1%
1 14224
11.5%
0 13501
10.9%
* 11621
9.4%
4 5753
 
4.6%
2 5345
 
4.3%
3 5311
 
4.3%
Other values (9) 7941
6.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 123918
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
T 15056
12.1%
A 15056
12.1%
R 15056
12.1%
J 15054
12.1%
1 14224
11.5%
0 13501
10.9%
* 11621
9.4%
4 5753
 
4.6%
2 5345
 
4.3%
3 5311
 
4.3%
Other values (9) 7941
6.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 123918
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
T 15056
12.1%
A 15056
12.1%
R 15056
12.1%
J 15054
12.1%
1 14224
11.5%
0 13501
10.9%
* 11621
9.4%
4 5753
 
4.6%
2 5345
 
4.3%
3 5311
 
4.3%
Other values (9) 7941
6.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 123918
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
T 15056
12.1%
A 15056
12.1%
R 15056
12.1%
J 15054
12.1%
1 14224
11.5%
0 13501
10.9%
* 11621
9.4%
4 5753
 
4.6%
2 5345
 
4.3%
3 5311
 
4.3%
Other values (9) 7941
6.4%
Distinct12741
Distinct (%)74.7%
Missing0
Missing (%)0.0%
Memory size133.3 KiB
2024-04-18T08:20:23.837420image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length30
Median length26
Mean length13.402346
Min length3

Characters and Unicode

Total characters228510
Distinct characters25
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11206 ?
Unique (%)65.7%

Sample

1st rowCAYRPPGTYKYIF
2nd rowCAYTVLGNEKLTF
3rd rowCAVAGYGGSQGNLIF
4th rowCAVSFGNEKLTF
5th rowCAVTHYGGSQGNLIF
ValueCountFrequency (%)
caglnyggsqgnlif 102
 
0.6%
caasetsydkvif 97
 
0.6%
cagqnyggsqgnlif 95
 
0.6%
cadsgggadgltf 80
 
0.5%
cagmnyggsqgnlif 77
 
0.5%
caigpgnmltf 72
 
0.4%
cagggsqgnlif 71
 
0.4%
cavdlmktsydkvif 71
 
0.4%
cavgdnfnkfyf 39
 
0.2%
cagagsqgnlif 35
 
0.2%
Other values (12731) 16311
95.7%
2024-04-18T08:20:24.176675image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
G 29140
12.8%
A 24526
10.7%
F 19593
 
8.6%
L 17353
 
7.6%
C 16573
 
7.3%
S 15848
 
6.9%
N 14483
 
6.3%
T 13452
 
5.9%
V 11731
 
5.1%
K 10240
 
4.5%
Other values (15) 55571
24.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 228510
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
G 29140
12.8%
A 24526
10.7%
F 19593
 
8.6%
L 17353
 
7.6%
C 16573
 
7.3%
S 15848
 
6.9%
N 14483
 
6.3%
T 13452
 
5.9%
V 11731
 
5.1%
K 10240
 
4.5%
Other values (15) 55571
24.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 228510
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
G 29140
12.8%
A 24526
10.7%
F 19593
 
8.6%
L 17353
 
7.6%
C 16573
 
7.3%
S 15848
 
6.9%
N 14483
 
6.3%
T 13452
 
5.9%
V 11731
 
5.1%
K 10240
 
4.5%
Other values (15) 55571
24.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 228510
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
G 29140
12.8%
A 24526
10.7%
F 19593
 
8.6%
L 17353
 
7.6%
C 16573
 
7.3%
S 15848
 
6.9%
N 14483
 
6.3%
T 13452
 
5.9%
V 11731
 
5.1%
K 10240
 
4.5%
Other values (15) 55571
24.3%

TRBV
Text

MISSING 

Distinct225
Distinct (%)1.4%
Missing1259
Missing (%)7.4%
Memory size133.3 KiB
2024-04-18T08:20:24.381679image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length20
Median length19
Mean length9.0155152
Min length5

Characters and Unicode

Total characters142364
Distinct characters22
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique52 ?
Unique (%)0.3%

Sample

1st rowTRBV14*01
2nd rowTRBV28*01
3rd rowTRBV28*01
4th rowTRBV28*01
5th rowTRBV28*01
ValueCountFrequency (%)
trbv19*01 1594
 
10.1%
trbv20-1*01 683
 
4.3%
trbv27*01 629
 
4.0%
trbv7-9*01 625
 
4.0%
trbv9*01 573
 
3.6%
trbv11-2*01 432
 
2.7%
trbv4-1*01 408
 
2.6%
trbv28*01 381
 
2.4%
trbv6-5*01 378
 
2.4%
trbv2*01 332
 
2.1%
Other values (213) 9757
61.8%
2024-04-18T08:20:24.732935image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 20733
14.6%
R 15792
11.1%
T 15791
11.1%
B 15791
11.1%
V 15791
11.1%
0 13507
9.5%
* 11977
8.4%
- 9528
6.7%
2 6032
 
4.2%
9 3993
 
2.8%
Other values (12) 13429
9.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 142364
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 20733
14.6%
R 15792
11.1%
T 15791
11.1%
B 15791
11.1%
V 15791
11.1%
0 13507
9.5%
* 11977
8.4%
- 9528
6.7%
2 6032
 
4.2%
9 3993
 
2.8%
Other values (12) 13429
9.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 142364
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 20733
14.6%
R 15792
11.1%
T 15791
11.1%
B 15791
11.1%
V 15791
11.1%
0 13507
9.5%
* 11977
8.4%
- 9528
6.7%
2 6032
 
4.2%
9 3993
 
2.8%
Other values (12) 13429
9.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 142364
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 20733
14.6%
R 15792
11.1%
T 15791
11.1%
B 15791
11.1%
V 15791
11.1%
0 13507
9.5%
* 11977
8.4%
- 9528
6.7%
2 6032
 
4.2%
9 3993
 
2.8%
Other values (12) 13429
9.4%

TRBJ
Text

MISSING 

Distinct60
Distinct (%)0.4%
Missing1483
Missing (%)8.7%
Memory size133.3 KiB
2024-04-18T08:20:24.888463image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length12
Median length10
Mean length9.2960108
Min length5

Characters and Unicode

Total characters144711
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)0.1%

Sample

1st rowTRBJ2-1*01
2nd rowTRBJ2-1*01
3rd rowTRBJ1-1*01
4th rowTRBJ1-5*01
5th rowTRBJ2-3*01
ValueCountFrequency (%)
trbj2-7*01 2273
14.6%
trbj2-1*01 1740
11.2%
trbj1-2*01 1420
 
9.1%
trbj2-3*01 1371
 
8.8%
trbj1-1*01 1284
 
8.2%
trbj2-2*01 1063
 
6.8%
trbj2-5*01 803
 
5.2%
trbj1-5*01 732
 
4.7%
trbj2-7 613
 
3.9%
trbj1-2 537
 
3.4%
Other values (46) 3731
24.0%
2024-04-18T08:20:25.173047image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 21592
14.9%
T 15567
10.8%
R 15567
10.8%
B 15567
10.8%
J 15567
10.8%
- 15354
10.6%
2 13199
9.1%
0 11970
8.3%
* 11970
8.3%
7 2925
 
2.0%
Other values (7) 5433
 
3.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 144711
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 21592
14.9%
T 15567
10.8%
R 15567
10.8%
B 15567
10.8%
J 15567
10.8%
- 15354
10.6%
2 13199
9.1%
0 11970
8.3%
* 11970
8.3%
7 2925
 
2.0%
Other values (7) 5433
 
3.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 144711
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 21592
14.9%
T 15567
10.8%
R 15567
10.8%
B 15567
10.8%
J 15567
10.8%
- 15354
10.6%
2 13199
9.1%
0 11970
8.3%
* 11970
8.3%
7 2925
 
2.0%
Other values (7) 5433
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 144711
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 21592
14.9%
T 15567
10.8%
R 15567
10.8%
B 15567
10.8%
J 15567
10.8%
- 15354
10.6%
2 13199
9.1%
0 11970
8.3%
* 11970
8.3%
7 2925
 
2.0%
Other values (7) 5433
 
3.8%
Distinct13631
Distinct (%)79.9%
Missing0
Missing (%)0.0%
Memory size133.3 KiB
2024-04-18T08:20:25.348011image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length29
Median length25
Mean length14.249795
Min length4

Characters and Unicode

Total characters242959
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12346 ?
Unique (%)72.4%

Sample

1st rowCASSALASLNEQFF
2nd rowCASSFTPYNEQFF
3rd rowCASSPQGLGTEAFF
4th rowCAEGQGFVGQPQHF
5th rowCASLRSAVWADTQYF
ValueCountFrequency (%)
cassirssyeqyf 189
 
1.1%
casswgggshygytf 146
 
0.9%
cassfsgntgelff 97
 
0.6%
casslrdgseaff 86
 
0.5%
cassirsayeqyf 42
 
0.2%
csvdleanygytf 31
 
0.2%
cassirstdtqyf 28
 
0.2%
cassarssyeqyf 27
 
0.2%
cassqrpsevgelff 26
 
0.2%
cassfpgqgntqyf 26
 
0.2%
Other values (13621) 16352
95.9%
2024-04-18T08:20:25.666495image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
S 39769
16.4%
G 25361
10.4%
A 25183
10.4%
F 23629
9.7%
Y 15999
 
6.6%
T 15836
 
6.5%
C 15813
 
6.5%
Q 15569
 
6.4%
E 13635
 
5.6%
L 9990
 
4.1%
Other values (11) 42175
17.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 242959
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 39769
16.4%
G 25361
10.4%
A 25183
10.4%
F 23629
9.7%
Y 15999
 
6.6%
T 15836
 
6.5%
C 15813
 
6.5%
Q 15569
 
6.4%
E 13635
 
5.6%
L 9990
 
4.1%
Other values (11) 42175
17.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 242959
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 39769
16.4%
G 25361
10.4%
A 25183
10.4%
F 23629
9.7%
Y 15999
 
6.6%
T 15836
 
6.5%
C 15813
 
6.5%
Q 15569
 
6.4%
E 13635
 
5.6%
L 9990
 
4.1%
Other values (11) 42175
17.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 242959
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 39769
16.4%
G 25361
10.4%
A 25183
10.4%
F 23629
9.7%
Y 15999
 
6.6%
T 15836
 
6.5%
C 15813
 
6.5%
Q 15569
 
6.4%
E 13635
 
5.6%
L 9990
 
4.1%
Other values (11) 42175
17.4%

TRAC
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing17050
Missing (%)100.0%
Memory size133.3 KiB

TRBC
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing17050
Missing (%)100.0%
Memory size133.3 KiB
Distinct576
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Memory size133.3 KiB
2024-04-18T08:20:25.866052image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length25
Median length9
Mean length9.667566
Min length8

Characters and Unicode

Total characters164832
Distinct characters20
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique259 ?
Unique (%)1.5%

Sample

1st rowFLKEKGGL
2nd rowELAGIGILTV
3rd rowELAGIGILTV
4th rowELAGIGILTV
5th rowELAGIGILTV
ValueCountFrequency (%)
klggalqak 5440
31.9%
gilgfvftl 1650
 
9.7%
rakfkqll 953
 
5.6%
avfdrksdak 756
 
4.4%
llwngpmav 644
 
3.8%
tfeyvsqpflmdle 561
 
3.3%
ivtdfsvik 512
 
3.0%
llldrlnql 492
 
2.9%
nlvpmvatv 480
 
2.8%
elagigiltv 422
 
2.5%
Other values (566) 5140
30.1%
2024-04-18T08:20:26.207922image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
L 29199
17.7%
A 19508
11.8%
G 19161
11.6%
K 16766
10.2%
V 10287
 
6.2%
Q 10172
 
6.2%
F 9428
 
5.7%
T 7253
 
4.4%
I 6367
 
3.9%
P 5642
 
3.4%
Other values (10) 31049
18.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 164832
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
L 29199
17.7%
A 19508
11.8%
G 19161
11.6%
K 16766
10.2%
V 10287
 
6.2%
Q 10172
 
6.2%
F 9428
 
5.7%
T 7253
 
4.4%
I 6367
 
3.9%
P 5642
 
3.4%
Other values (10) 31049
18.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 164832
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
L 29199
17.7%
A 19508
11.8%
G 19161
11.6%
K 16766
10.2%
V 10287
 
6.2%
Q 10172
 
6.2%
F 9428
 
5.7%
T 7253
 
4.4%
I 6367
 
3.9%
P 5642
 
3.4%
Other values (10) 31049
18.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 164832
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
L 29199
17.7%
A 19508
11.8%
G 19161
11.6%
K 16766
10.2%
V 10287
 
6.2%
Q 10172
 
6.2%
F 9428
 
5.7%
T 7253
 
4.4%
I 6367
 
3.9%
P 5642
 
3.4%
Other values (10) 31049
18.8%

MHC A
Text

MISSING 

Distinct102
Distinct (%)0.6%
Missing1276
Missing (%)7.5%
Memory size133.3 KiB
2024-04-18T08:20:26.370314image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length20
Median length11
Mean length11.034677
Min length6

Characters and Unicode

Total characters174061
Distinct characters23
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)0.2%

Sample

1st rowHLA-B*08
2nd rowHLA-A*02
3rd rowHLA-A*02
4th rowHLA-A*02
5th rowHLA-A*02
ValueCountFrequency (%)
hla-a*03:01 5667
35.9%
hla-a*02:01 5058
32.1%
hla-a*11:01 1310
 
8.3%
hla-b*08:01 989
 
6.3%
hla-b*07:02 394
 
2.5%
hla-a*02 329
 
2.1%
hla-a*01:01 298
 
1.9%
hla-a*24:02 280
 
1.8%
hla-dqa1*05:01 275
 
1.7%
hla-b*57:01 194
 
1.2%
Other values (92) 980
 
6.2%
2024-04-18T08:20:26.664582image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 29522
17.0%
0 28818
16.6%
1 18342
10.5%
H 15774
9.1%
L 15774
9.1%
- 15774
9.1%
* 15631
9.0%
: 15327
8.8%
2 6654
 
3.8%
3 5907
 
3.4%
Other values (13) 6538
 
3.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 174061
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
A 29522
17.0%
0 28818
16.6%
1 18342
10.5%
H 15774
9.1%
L 15774
9.1%
- 15774
9.1%
* 15631
9.0%
: 15327
8.8%
2 6654
 
3.8%
3 5907
 
3.4%
Other values (13) 6538
 
3.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 174061
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
A 29522
17.0%
0 28818
16.6%
1 18342
10.5%
H 15774
9.1%
L 15774
9.1%
- 15774
9.1%
* 15631
9.0%
: 15327
8.8%
2 6654
 
3.8%
3 5907
 
3.4%
Other values (13) 6538
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 174061
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
A 29522
17.0%
0 28818
16.6%
1 18342
10.5%
H 15774
9.1%
L 15774
9.1%
- 15774
9.1%
* 15631
9.0%
: 15327
8.8%
2 6654
 
3.8%
3 5907
 
3.4%
Other values (13) 6538
 
3.8%

MHC B
Text

MISSING 

Distinct54
Distinct (%)3.6%
Missing15570
Missing (%)91.3%
Memory size133.3 KiB
2024-04-18T08:20:26.802619image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length20
Median length14
Mean length12.982432
Min length8

Characters and Unicode

Total characters19214
Distinct characters22
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)1.1%

Sample

1st rowHLA-DRB1*15:03
2nd rowHLA-DRB3*03:01
3rd rowHLA-DPB1*13:01
4th rowHLA-DRB5*01:01:01
5th rowHLA-DRB1*01:01:01
ValueCountFrequency (%)
hla-dpb1*04:01 403
27.2%
hla-dqb1*06:02 229
15.5%
hla-a*02 154
 
10.4%
hla-drb1*04:01 150
 
10.1%
hla-drb1*07:01 135
 
9.1%
hla-a*02:01 102
 
6.9%
hla-dqb1*02:01 60
 
4.1%
hla-a*24:02 33
 
2.2%
hla-drb1*14:02 25
 
1.7%
hla-drb1*15:01 20
 
1.4%
Other values (44) 169
11.4%
2024-04-18T08:20:27.057133image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 2662
13.9%
1 2191
11.4%
A 1777
9.2%
H 1480
7.7%
- 1480
7.7%
L 1480
7.7%
* 1476
7.7%
: 1337
7.0%
B 1178
 
6.1%
D 1125
 
5.9%
Other values (12) 3028
15.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 19214
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 2662
13.9%
1 2191
11.4%
A 1777
9.2%
H 1480
7.7%
- 1480
7.7%
L 1480
7.7%
* 1476
7.7%
: 1337
7.0%
B 1178
 
6.1%
D 1125
 
5.9%
Other values (12) 3028
15.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 19214
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 2662
13.9%
1 2191
11.4%
A 1777
9.2%
H 1480
7.7%
- 1480
7.7%
L 1480
7.7%
* 1476
7.7%
: 1337
7.0%
B 1178
 
6.1%
D 1125
 
5.9%
Other values (12) 3028
15.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 19214
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 2662
13.9%
1 2191
11.4%
A 1777
9.2%
H 1480
7.7%
- 1480
7.7%
L 1480
7.7%
* 1476
7.7%
: 1337
7.0%
B 1178
 
6.1%
D 1125
 
5.9%
Other values (12) 3028
15.8%

Binding
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size133.3 KiB
1
17050 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters17050
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 17050
100.0%

Length

2024-04-18T08:20:27.195953image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-18T08:20:27.296566image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
1 17050
100.0%

Most occurring characters

ValueCountFrequency (%)
1 17050
100.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 17050
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 17050
100.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 17050
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 17050
100.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 17050
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 17050
100.0%

task
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size133.3 KiB
TPP2
11101 
TPP1
4857 
TPP3
 
1092

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters68200
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTPP2
2nd rowTPP2
3rd rowTPP2
4th rowTPP2
5th rowTPP2

Common Values

ValueCountFrequency (%)
TPP2 11101
65.1%
TPP1 4857
28.5%
TPP3 1092
 
6.4%

Length

2024-04-18T08:20:27.399940image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-18T08:20:27.506381image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
tpp2 11101
65.1%
tpp1 4857
28.5%
tpp3 1092
 
6.4%

Most occurring characters

ValueCountFrequency (%)
P 34100
50.0%
T 17050
25.0%
2 11101
 
16.3%
1 4857
 
7.1%
3 1092
 
1.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 68200
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
P 34100
50.0%
T 17050
25.0%
2 11101
 
16.3%
1 4857
 
7.1%
3 1092
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 68200
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
P 34100
50.0%
T 17050
25.0%
2 11101
 
16.3%
1 4857
 
7.1%
3 1092
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 68200
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
P 34100
50.0%
T 17050
25.0%
2 11101
 
16.3%
1 4857
 
7.1%
3 1092
 
1.6%

Interactions

2024-04-18T08:20:21.289186image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-18T08:20:21.065445image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-18T08:20:21.391983image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-18T08:20:21.188771image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Correlations

2024-04-18T08:20:27.580375image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
TCR_nameUnnamed: 0task
TCR_name1.0000.1000.518
Unnamed: 00.1001.0000.711
task0.5180.7111.000

Missing values

2024-04-18T08:20:21.535660image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-18T08:20:21.760330image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Unnamed: 0TCR_nameTRAVTRAJTRA_CDR3TRBVTRBJTRB_CDR3TRACTRBCEpitopeMHC AMHC BBindingtask
004TRAV38-2/DV8*01TRAJ40*01CAYRPPGTYKYIFTRBV14*01TRBJ2-1*01CASSALASLNEQFFNaNNaNFLKEKGGLHLA-B*08NaN1TPP2
1114TRAV38-1*01TRAJ48*01CAYTVLGNEKLTFTRBV28*01TRBJ2-1*01CASSFTPYNEQFFNaNNaNELAGIGILTVHLA-A*02NaN1TPP2
2215TRAV12-2*01TRAJ42*01CAVAGYGGSQGNLIFTRBV28*01TRBJ1-1*01CASSPQGLGTEAFFNaNNaNELAGIGILTVHLA-A*02NaN1TPP2
3316TRAV12-2*01TRAJ48*01CAVSFGNEKLTFTRBV28*01TRBJ1-5*01CAEGQGFVGQPQHFNaNNaNELAGIGILTVHLA-A*02NaN1TPP2
4417TRAV12-2*01TRAJ42*01CAVTHYGGSQGNLIFTRBV28*01TRBJ2-3*01CASLRSAVWADTQYFNaNNaNELAGIGILTVHLA-A*02NaN1TPP2
5518TRAV12-2*01TRAJ45*01CAGGGGGADGLTFTRBV28*01TRBJ1-5*01CASTLTGLGQPQHFNaNNaNELAGIGILTVHLA-A*02NaN1TPP2
6619TRAV12-2*01TRAJ23*01CAVTWGGKLIFTRBV28*01TRBJ1-1*01CASSFQGLGTEAFFNaNNaNELAGIGILTVHLA-A*02NaN1TPP2
7720TRAV12-2*01NaNCCAVSIGFGNVLHCGFTRBV27*01TRBJ2-1*01CASSFNDEQFFNaNNaNELAGIGILTVHLA-A*02NaN1TPP2
8821TRAV12-2*01TRAJ31*01CAVNNARLMFTRBV27*01TRBJ2-3*01CASSPSGLAGGHTQYFNaNNaNELAGIGILTVHLA-A*02NaN1TPP2
9922TRAV12-2*01NaNCCAATIGFGNVLHCGFTRBV27*01TRBJ2-1*01CASSMTSYNEQFFNaNNaNELAGIGILTVHLA-A*02NaN1TPP2
Unnamed: 0TCR_nameTRAVTRAJTRA_CDR3TRBVTRBJTRB_CDR3TRACTRBCEpitopeMHC AMHC BBindingtask
17040170407873TRAV14/DV4*01TRAJ9*01CAMREGENGGFKTIFTRBV7-9*01TRBJ1-4*01CASSLVGGTDEKLFFNaNNaNRAKFKQLLHLA-B*08:01NaN1TPP1
17041170417875TRAV8-2*01TRAJ4*01CVVSEAGGYNKLIFTRBV5-1*01TRBJ1-1*01CASSLGSGWEAFFNaNNaNKLGGALQAKHLA-A*03:01NaN1TPP1
17042170427876TRAV13-2*01TRAJ52*01CAERVGAGGTSYGKLTFTRBV6-3*01TRBJ2-2*01CASSYGFGGHNTGELFFNaNNaNKLGGALQAKHLA-A*03:01NaN1TPP1
17043170437877TRAV8-6*01TRAJ10*01CAVSGWGLTGGGNKLTFTRBV6-5*01TRBJ1-1*01CASTGPLNTEAFFNaNNaNKLGGALQAKHLA-A*03:01NaN1TPP1
17044170447879TRAV22*01TRAJ37*01CAGSPSNTGKLIFTRBV7-9*01TRBJ2-7*01CASSTSEGGLFYEQYFNaNNaNGILGFVFTLHLA-A*02:01NaN1TPP1
17045170457881TRAV8-3*01TRAJ26*01CAVGARDYGQNFVFTRBV7-3*01TRBJ2-2*01CASSLGTSGGTGELFFNaNNaNRAKFKQLLHLA-B*08:01NaN1TPP1
17046170467884TRAV9-2*01TRAJ30*01CALLNRDDKIIFTRBV5-1*01TRBJ1-1*01CASSYGTGENTEAFFNaNNaNRAKFKQLLHLA-B*08:01NaN1TPP1
17047170477887TRAV19*01TRAJ17*01CALKLIKAAGNKLTFTRBV4-1*01TRBJ1-2*01CASSTSTGTGYGYTFNaNNaNRAKFKQLLHLA-B*08:01NaN1TPP1
17048170487888TRAV5*01TRAJ31*01CAEDNNARLMFTRBV20-1*01TRBJ1-3*01CSARPQPVGNTIYFNaNNaNGLCTLVAMLHLA-A*02:01NaN1TPP1
17049170497890TRAV13-1*01TRAJ20*01CAASGYDYKLSFTRBV5-6*01TRBJ1-1*01CASSLRDGSEAFFNaNNaNRAKFKQLLHLA-B*08:01NaN1TPP1